knitr document van Steensel lab

Introduction

I sequenced the complete insert of the pDNA library of pMT02. I already extracted all sequences in front of the 3’ adapter from the sequences data and added counts to identical sequences by starcode. I now want to make an overview about how many pDNA insert sequences in the pDNA still match the designed inserts.

Description of Data

How to make a good rendering table:

column1 column2 column3
1 2 3
a b c

Data processing

Path, Libraries, Parameters and Useful Functions

Custom functions

Functions used thoughout this script.

Data import

## Parsed with column specification:
## cols(
##   sequence = col_character(),
##   number = col_double()
## )

Analysis

Match barcodes

Correlate to GC contenct

## Users can try to set parallel.cores = -1 to use all cores!
## Processing... 2020-11-11 16:27:27
## Calculating GC content...
## 
## Completed. 2020-11-11 16:27:33
## No trace type specified:
##   Based on info supplied, a 'scatter' trace seems appropriate.
##   Read more about this trace type -> https://plot.ly/r/reference/#scatter
## No scatter mode specifed:
##   Setting the mode to markers
##   Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode

Filter data

Match barcodes

Plot how many barcodes are found in pDNA data

How many raw complete sequences match with the design?

Identify those barcodes that are attached to a wrong insert

Clearly wrongly assigned barcodes can be assigned to the correct insert Barcodes that are attached to a mixed population of inserts should to be excluded from any analysis where this plasmid library was used

Barcode re-evaluation

Investigate the mutational load of the barcodes with a good match

Investigate mutational load of only Trp53 constructs (as they are especially complex to PCR up)

Exporting data

# # Export barcodes that are attached to multiple inserts
# bc_exclude <- matching_df_exclude$barcode %>% unique()
# write.csv(bc_exclude, "/DATA/usr/m.trauernicht/projects/SuRE-TF/data/pDNA_insert_seq/bc_exclude.csv")
# 
# # Export barcodes that are attached to the wrong insert
# bc_replace <- pDNA_seq_incorrect %>% select(barcode, `bc-match`, `insert-match`) %>% unique()
# write.csv(bc_replace, "/DATA/usr/m.trauernicht/projects/SuRE-TF/data/pDNA_insert_seq/bc_replace.csv")

Session Info

paste("Run time: ",format(Sys.time()-StartTime))
## [1] "Run time:  38.94081 secs"
getwd()
## [1] "/DATA/usr/m.trauernicht/projects/SuRE-TF/pDNA_insert_seq"
date()
## [1] "Wed Nov 11 16:27:37 2020"
sessionInfo()
## R version 3.6.3 (2020-02-29)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.7 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/libblas/libblas.so.3.6.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] tibble_3.0.1                plotly_4.9.2.1             
##  [3] LncFinder_1.1.4             sunburstR_2.1.4            
##  [5] d3r_0.9.0                   vwr_0.3.0                  
##  [7] latticeExtra_0.6-29         lattice_0.20-38            
##  [9] stringdist_0.9.5.5          ggbeeswarm_0.6.0           
## [11] ggplot2_3.3.0               dplyr_0.8.5                
## [13] readr_1.3.1                 tidyr_1.0.0                
## [15] phylotools_0.2.2            ape_5.4-1                  
## [17] maditr_0.6.3                plyr_1.8.6                 
## [19] ShortRead_1.42.0            GenomicAlignments_1.20.1   
## [21] SummarizedExperiment_1.14.1 DelayedArray_0.10.0        
## [23] matrixStats_0.55.0          Biobase_2.44.0             
## [25] Rsamtools_2.0.3             GenomicRanges_1.36.1       
## [27] GenomeInfoDb_1.20.0         Biostrings_2.52.0          
## [29] XVector_0.24.0              IRanges_2.18.3             
## [31] S4Vectors_0.22.1            BiocParallel_1.18.1        
## [33] BiocGenerics_0.30.0         seqinr_3.6-1               
## 
## loaded via a namespace (and not attached):
##  [1] colorspace_1.4-1       hwriter_1.3.2          ellipsis_0.3.0        
##  [4] class_7.3-15           farver_2.0.1           prodlim_2019.11.13    
##  [7] lubridate_1.7.4        codetools_0.2-16       splines_3.6.3         
## [10] knitr_1.30             ade4_1.7-13            jsonlite_1.7.1        
## [13] pROC_1.16.1            caret_6.0-85           png_0.1-7             
## [16] shiny_1.4.0            compiler_3.6.3         httr_1.4.1            
## [19] fastmap_1.0.1          assertthat_0.2.1       Matrix_1.2-18         
## [22] lazyeval_0.2.2         later_1.1.0.1          htmltools_0.5.0       
## [25] tools_3.6.3            gtable_0.3.0           glue_1.4.2            
## [28] GenomeInfoDbData_1.2.1 reshape2_1.4.4         Rcpp_1.0.5            
## [31] vctrs_0.2.4            nlme_3.1-143           iterators_1.0.12      
## [34] crosstalk_1.0.0        timeDate_3043.102      gower_0.2.1           
## [37] xfun_0.19              stringr_1.4.0          mime_0.9              
## [40] lifecycle_0.2.0        zlibbioc_1.30.0        MASS_7.3-51.5         
## [43] scales_1.1.0           ipred_0.9-9            promises_1.1.1        
## [46] hms_0.5.3              RColorBrewer_1.1-2     yaml_2.2.1            
## [49] rpart_4.1-15           stringi_1.5.3          foreach_1.4.7         
## [52] e1071_1.7-4            lava_1.6.6             rlang_0.4.8           
## [55] pkgconfig_2.0.3        bitops_1.0-6           evaluate_0.14         
## [58] purrr_0.3.3            recipes_0.1.9          htmlwidgets_1.5.2     
## [61] tidyselect_1.1.0       magrittr_1.5           R6_2.5.0              
## [64] generics_0.0.2         pillar_1.4.3           withr_2.1.2           
## [67] survival_3.1-8         RCurl_1.95-4.12        nnet_7.3-12           
## [70] crayon_1.3.4           rmarkdown_2.5          jpeg_0.1-8.1          
## [73] grid_3.6.3             data.table_1.12.8      ModelMetrics_1.2.2.1  
## [76] digest_0.6.27          xtable_1.8-4           httpuv_1.5.4          
## [79] munsell_0.5.0          beeswarm_0.2.3         viridisLite_0.3.0     
## [82] vipor_0.4.5